Data Analysis of (Non-)Metric Proximities at Linear Costs
نویسندگان
چکیده
Domain specific (dis-)similarity or proximity measures, employed e.g. in alignment algorithms in bio-informatics, are often used to compare complex data objects and to cover domain specific data properties. Lacking an underlying vector space, data are given as pairwise (dis-)similarities. The few available methods for such data do not scale well to very large data sets. Kernel methods easily deal with metric similarity matrices, also at large scale, but costly transformations are necessary starting with non-metric (dis-) similarities. We propose an integrative combination of Nyström approximation, potential double centering and eigenvalue correction to obtain valid kernel matrices at linear costs. Accordingly effective kernel approaches, become accessible for these data. Evaluation at several larger (dis-)similarity data sets shows that the proposed method achieves much better runtime performance than the standard strategy while keeping competitive model accuracy. Our main contribution is an efficient linear technique, to convert (potentially non-metric) large scale dissimilarity matrices into approximated positive semi-definite kernel matrices.
منابع مشابه
Generalized multivalued $F$-contractions on non-complete metric spaces
In this paper, we explain a new generalized contractive condition for multivalued mappings and prove a fixed point theorem in metric spaces (not necessary complete) which extends some well-known results in the literature. Finally, as an application, we prove that a multivalued function satisfying a general linear functional inclusion admits a unique selection fulfilling the corresp...
متن کاملLocal multidimensional scaling with controlled tradeoff between trustworthiness and continuity
In a visualization task, every nonlinear projection method needs to make a compromise between trustworthiness and continuity. In a trustworthy projection the visualized proximities hold in the original data as well, whereas a continuous projection visualizes all proximities of the original data. A multidimensional scaling method, curvilinear components analysis, is good at maximizing trustworth...
متن کاملLocal multidimensional scaling
In a visualization task, every nonlinear projection method needs to make a compromise between trustworthiness and continuity. In a trustworthy projection the visualized proximities hold in the original data as well, whereas a continuous projection visualizes all proximities of the original data. We show experimentally that one of the multidimensional scaling methods, curvilinear components anal...
متن کاملNon-linear ergodic theorems in complete non-positive curvature metric spaces
Hadamard (or complete $CAT(0)$) spaces are complete, non-positive curvature, metric spaces. Here, we prove a nonlinear ergodic theorem for continuous non-expansive semigroup in these spaces as well as a strong convergence theorem for the commutative case. Our results extend the standard non-linear ergodic theorems for non-expansive maps on real Hilbert spaces, to non-expansive maps on Ha...
متن کاملFUZZY LINEAR REGRESSION BASED ON LEAST ABSOLUTES DEVIATIONS
This study is an investigation of fuzzy linear regression model for crisp/fuzzy input and fuzzy output data. A least absolutes deviations approach to construct such a model is developed by introducing and applying a new metric on the space of fuzzy numbers. The proposed approach, which can deal with both symmetric and non-symmetric fuzzy observations, is compared with several existing models by...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013